Skip to content

[SPARK-56744][INFRA] Document test base class hierarchy in AGENTS.md#55707

Open
zhengruifeng wants to merge 11 commits intoapache:masterfrom
zhengruifeng:add-test-base-class-guide
Open

[SPARK-56744][INFRA] Document test base class hierarchy in AGENTS.md#55707
zhengruifeng wants to merge 11 commits intoapache:masterfrom
zhengruifeng:add-test-base-class-guide

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented May 6, 2026

What changes were proposed in this pull request?

Add a Scala Test Base Classes section to AGENTS.md that documents the layered Scala test base hierarchy in this repo and how to pick a base class for a new test suite. Spark uses the AnyFunSuite ScalaTest style throughout, and the chain is:

SparkFunSuite                                                           (core)
  <- PlanTest                                                           (sql/catalyst)
    <- QueryTest                                                        (sql/core)
      <- SharedSparkSession                                             (sql/core)

The new section also includes:

  • A decision table mapping test scope (plain JVM, Catalyst plans, abstract QueryTest, full SQL with session) to the right base.
  • A note that QueryTest declares spark: SparkSession abstractly via SparkSessionProvider, plus a session-provider table listing the two formal SparkSessionProvider implementations in the repo: SharedSparkSession (sql/core) and TestHiveSingleton (sql/hive).
  • A linearization gotcha: the first item in an extends clause must transitively extend a class. Pure helper traits (*ErrorsBase, *Helper) cannot be put first.

CLAUDE.md is a symlink to AGENTS.md, so this change is picked up by both AI agent toolchains.

Why are the changes needed?

Picking the wrong test base class (e.g. extending QueryTest directly when a session is needed, or SparkFunSuite when PlanTest would do) is a common stumble when adding new Scala test suites. The information is currently spread across the source of SparkFunSuite, PlanTest, QueryTest, and SharedSparkSession, with no single place that summarizes when to use which. Documenting it in AGENTS.md gives both contributors and AI coding agents a quick reference.

Does this PR introduce any user-facing change?

No. Documentation-only change to a developer/agent guide file.

How was this patch tested?

N/A. Documentation-only change; no code or tests are affected.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude opus-4-7

Add a "Test Base Classes" section to AGENTS.md describing the layered
SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession hierarchy,
when to pick each base, common redundant-mixin patterns to avoid, and
the linearization gotcha when the first item in an `extends` clause is
a pure trait without the SparkFunSuite class chain.

Generated-by: Claude opus-4-7
…sion) in AGENTS.md

Clarify that QueryTest declares `spark` abstractly and that `SharedSparkSession` is one of several session-providing traits in the repo. List the common providers (`SharedSparkSession`, `TestHiveSingleton`, `RemoteSparkSession`) and note the rare manual-override pattern.

Generated-by: Claude opus-4-7
…ENTS.md

Add a "Base chain" column to the session provider table and explain that
RemoteSparkSession directly extends AnyFunSuite (not SparkFunSuite), so
Connect client suites must combine it with QueryTest to recover the
SparkFunSuite-only features (per-test timeout, gridTest, retry,
log-capture).

Generated-by: Claude opus-4-7
…S.md

Add SparkTestSuite as the bottom of the test base class chain — it
holds the common Spark test functionality (thread audit, fixed
timezone/locale, withTempDir, withLogAppender, checkError) and is
style-agnostic. SparkFunSuite is AnyFunSuite + SparkTestSuite. Mention
the alternative-style demonstration suites under core/.../*SparkTestSuite.scala
and update the linearization gotcha to note that SparkTestSuite is
itself a pure trait.

Generated-by: Claude opus-4-7
…TS.md

Drop the RemoteSparkSession row and follow-up paragraph from the session
provider section. The trait does not formally extend SparkSessionProvider
and serves a separate audience (Connect client tests under
sql/connect/client/jvm), so it is out of scope for this guide.

Generated-by: Claude opus-4-7
QueryTest extends SparkFunSuite with QueryTestBase with PlanTest, so
PlanTest is a parent of QueryTest. Indent QueryTest one level deeper
to reflect the actual hierarchy and show QueryTest as PlanTest +
QueryTestBase (PlanTest already brings the SparkFunSuite chain).

Generated-by: Claude opus-4-7
Add a parallel diagram for the style-agnostic `*Base` helper traits
(`PlanTestBase` -> `QueryTestBase` -> `SharedSparkSessionBase`) and note
that `PlanTest` / `QueryTest` / `SharedSparkSession` are the
FunSuite-flavored bundles built on top of them, analogous to
`SparkFunSuite` = `AnyFunSuite` + `SparkTestSuite`.

Generated-by: Claude opus-4-7
Drop the "manual session override", helper-trait combination, and
"common mistakes" subsections from the Test Base Classes section to
keep the guide focused on the layered base hierarchy and the
linearization gotcha.

Generated-by: Claude opus-4-7
Reorder the Test Base Classes section so the style-agnostic helper
traits (SparkTestSuite, PlanTestBase, QueryTestBase,
SharedSparkSessionBase) are introduced first, followed by the
class-bearing FunSuite-style bases that bundle each helper with
AnyFunSuite. The helpers are where the actual test functionality
lives, so leading with them makes the relationship clearer.

Generated-by: Claude opus-4-7
The section covers Scala test bases only (PySpark tests are handled
separately under Build and Test), so the more specific name avoids
confusion.

Generated-by: Claude opus-4-7
@zhengruifeng zhengruifeng marked this pull request as ready for review May 6, 2026 12:26
@zhengruifeng zhengruifeng requested a review from cloud-fan May 6, 2026 12:26
Comment thread AGENTS.md Outdated
SparkTestSuite (core; thread audit, fixed timezone/locale, withTempDir, withLogAppender, checkError)
PlanTestBase (sql/catalyst; plan-comparison helpers — comparePlans, normalizePlan)
<- QueryTestBase (sql/core; adds SQL/DataFrame helpers + abstract `spark` via `SparkSessionProvider`)
<- SharedSparkSessionBase (sql/core; provides the actual `TestSparkSession`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm do we need to mention then? Inside Spark we should always pick AnyFunSuite style for consistency

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — agreed. Dropped the style-agnostic *Base chain (and the related SparkTestSuite row + linearization-gotcha example) in 65bb5e6. The section now only documents the SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession chain.

Spark uses the AnyFunSuite ScalaTest style throughout, so the
SparkTestSuite / PlanTestBase / QueryTestBase / SharedSparkSessionBase
chain is not relevant to contributors picking a base for a new test
suite. Collapse the section to just the class-bearing chain
(SparkFunSuite -> PlanTest -> QueryTest -> SharedSparkSession), drop
the SparkTestSuite row from the decision table, fold its helpers into
the SparkFunSuite row, drop the now-redundant base-chain column from
the session-provider table, and trim the linearization gotcha to no
longer cite SparkTestSuite / AnyWordSpec.

Generated-by: Claude opus-4-7
zhengruifeng added a commit to zhengruifeng/spark that referenced this pull request May 7, 2026
…les_for_files

### What changes were proposed in this pull request?

This PR extends `determine_modules_for_files` in `dev/sparktestsupport/utils.py` to
ignore `AGENTS.md` and `CONTRIBUTING.md` in addition to the existing `README.md`.

### Why are the changes needed?

A documentation-only PR that touches only `AGENTS.md` (e.g.
apache#55707) currently triggers all CI test jobs because
the file is not associated with any submodule, so it falls through to the `root`
module. Neither file affects code or tests, and neither is consumed by the docs build,
so they should be ignored just like `README.md`.

### Does this PR introduce _any_ user-facing change?

No, this is only a testing infra change.

### How was this patch tested?

Updated and ran the doctests in `dev/sparktestsupport/utils.py`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants